Mandarin Electro-Laryngeal Speech Enhancement Using Cycle-Consistent Generative Adversarial Networks

نویسندگان

چکیده

Electro-laryngeal (EL) speech has poor intelligibility and naturalness, which hampers the popular use of electro-larynx. Voice conversion (VC) can enhance EL speech. However, if to be enhanced is with complicated tone variation rules in Mandarin, enhancement will less effective. This because source (Mandarin speech) target (normal are not strictly parallel. We propose using cycle-consistent generative adversarial networks (CycleGAN, a parallel-free VC framework) continuous Mandarin speech, solve above problem. In proposed framework, generator designed based on neural 2D-Conformer-1D-Transformer-2D-Conformer. Then, we used Mel-Spectrogram instead traditional acoustic features (fundamental frequency, Mel-Cepstrum parameters aperiodicity parameters). At last, converted into waveform signals WaveNet. undertook both subjective objective tests evaluate approach. Compared approaches variable (the average accuracy being 71.59% word error rate 10.85%), our framework increases by 12.12% reduces errors perception 9.15%. towards fixed 29.89% 10.74%), 42.38% 8.59%. Our effectively address problem that The naturalness have been further improved.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Colorization of Grayscale Images Using Generative Adversarial Networks

Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...

متن کامل

Robust Speech Recognition Using Generative Adversarial Networks

This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or sim...

متن کامل

SEGAN: Speech Enhancement Generative Adversarial Network

Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of ge...

متن کامل

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of produ...

متن کامل

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC)method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is generalpurpose, high quality, and parallel-data-free, which works without any extra data, modules, or alignment procedure. It is also noteworthy that it avoids over-smoothing, which occurs in many conventional statistical mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13010537